A Convex Atomic-Norm Approach to Multiple Sequence Alignment and Motif Discovery

نویسندگان

  • Ian En-Hsu Yen
  • Xin Lin
  • Jiong Zhang
  • Pradeep Ravikumar
  • Inderjit S. Dhillon
چکیده

Multiple Sequence Alignment and Motif Discovery, known as NP-hard problems, are two fundamental tasks in Bioinformatics. Existing approaches to these two problems are based on either local search methods such as Expectation Maximization (EM), Gibbs Sampling or greedy heuristic methods. In this work, we develop a convex relaxation approach to both problems based on the recent concept of atomic norm and develop a new algorithm, termed Greedy Direction Method of Multiplier, for solving the convex relaxation with two convex atomic constraints. Experiments show that our convex relaxation approach produces solutions of higher quality than those standard tools widely-used in Bioinformatics community on the Multiple Sequence Alignment and Motif Discovery problems.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Scalable Convex Multiple Sequence Alignment via Entropy-Regularized Dual Decomposition

Multiple Sequence Alignment (MSA) is one of the fundamental tasks in biological sequence analysis that underlies applications such as phylogenetic trees, profiles, and structure prediction. The task, however, is NP-hard, and the current practice resorts to heuristic and local-search methods. Recently, a convex optimization approach for MSA was proposed based on the concept of atomic norm [23], ...

متن کامل

An Application of the ABS LX Algorithm to Multiple Sequence Alignment

We present an application of ABS algorithms for multiple sequence alignment (MSA). The Markov decision process (MDP) based model leads to a linear programming problem (LPP), whose solution is linked to a suggested alignment. The important features of our work include the facility of alignment of multiple sequences simultaneously and no limit for the length of the sequences. Our goal here is to ...

متن کامل

An Approximate de Bruijn Graph Approach to Multiple Local Alignment and Motif Discovery in Protein Sequences

Motif discovery is an important problem in protein sequence analysis. Computationally, it can be viewed as an application of the more general multiple local alignment problem, which often encounters the difficulty of computer time when aligning many sequences. We introduce a new algorithm for multiple local alignment for protein sequences, based on the de Bruijn graph approach first proposed by...

متن کامل

Genetic Algorithm Based Probabilistic Motif Discovery in Unaligned Biological Sequences

Finding motif in biosequences is the most important primitive operation in computational biology. There are many computational requirements for a motif discovery algorithm such as computer memory space requirement and computational complexity. To overcome the complexity of motif discovery, we propose an alternative solution integrating genetic algorithm and Fuzzy Art machine learning approaches...

متن کامل

In Silico Analysis of Primary Sequence and Tertiary Structure of Lepidium Draba Peroxidase

Peroxidase enzymes are vastly applicable in industry and diagnosiss. Recently, we introduced a new kind of peroxidase gene from Lepidium draba (LDP). According to protein multiple sequence alignment results, LDP had 93% similarity and 88.96% identity with horseradish peroxidase C1A (HRP C1A). In the current study we employed in silico tools to determine, to which group of peroxidase enzymes LDP...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • JMLR workshop and conference proceedings

دوره 48  شماره 

صفحات  -

تاریخ انتشار 2016